Generating events in excess of licensed event count

ABSTRACT

In various implementations, a computer-implemented method for remotely managing settings of applications includes receiving a network communication from a managed device, the received network communication including a client-side hash value. The method further includes identifying settings for an application on the managed device in response to the receiving of the network communication, where the identified settings include configuration instructions for the application. Based on a comparison between the received client-side hash value and a server-side hash value that corresponds to the identified settings, at least some of the identified settings are transmitted to the managed device. The transmitting of the at least some of the identified settings can be based on the comparison indicating a mismatch between the received client-side hash value and the server-side hash value. The method may also include completing processing of the received network communication after the transmitting of the at least some of the identified settings.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.14/701,301, filed Apr. 30, 2015, and titled “Scaling Available StorageBased On Counting Generated Events,” which itself claims the benefit asa continuation-in-part of U.S. patent application Ser. No. 14/691,475,filed Apr. 20, 2015, and now issued as U.S. Pat. No. 10,282,455, theentire contents of each are hereby incorporated by reference herein.

FIELD OF THE DISCLOSURE

The present disclosure generally relates to measuring an amount of dataingested by a data intake and query system and presenting variousmetrics based on the measured amount.

BACKGROUND

The approaches described in this section are approaches that could bepursued, but not necessarily approaches that have been previouslyconceived or pursued. Therefore, unless otherwise indicated, it shouldnot be assumed that any of the approaches described in this sectionqualify as prior art merely by virtue of their inclusion in thissection.

Modern data centers and other computing environments often compriseanywhere from a few devices to thousands of computing devices thatprocess various types of data, service requests from an even largernumbers of remote clients, and perform many other computing functions.During operation, many of these devices may include components thatproduce significant volumes of machine-generated data. For example, manyof the devices may include components that produce various types of logfiles, output files, network data, etc.

Analysis of data generated by such computing devices may yield valuableinsight into both the overall operation of such computing environmentsand individual components thereof. However, the unstructured nature ofmuch of this data presents a number of challenges to analysis in partbecause of the difficulty of applying semantic meaning to unstructureddata. Furthermore, the data generated by the computing devices may varywidely both in the type and format of the data. As the number ofcomputing devices that generate various forms of machine data continuesto grow, processing and analyzing large volumes of such machine data inan intelligent manner and effectively presenting the results of suchanalysis remains a priority.

The amount of machine-generated data produced by a computing environmentmay depend on a number of devices in the computing environment and thetypes of tasks for which the devices are responsible. For example, asmall business may own a relatively small collection of servers andother network devices that collectively produce a relatively smallamount of machine-generated data. In contrast, a large corporation mayhave thousands of devices that produce massive amounts of data on adaily basis. Further, the amount of data generated by either computingenvironment may vary over time.

Some organizations may not have the resources or desire to manage one ormore computing environments in use by the company. For example, amid-sized company may desire that a third-party service provider managethe security of the company's internal network instead of hiringdedicated personnel to manage the network. In these circumstances andothers, an organization may outsource various computing environmentmanagement services to a service provider, such as a managed securityservices provider (MSSP). In the context of network security, forexample, an MSSP typically may use security information and eventmanagement (STEM) software to analyze data generated by network hardwareand applications for virus and spam blocking, intrusion detection,virtual private network (VPN) management, etc.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings:

FIG. 1 illustrates a networked computer environment in which anembodiment may be implemented;

FIG. 2 illustrates a block diagram of an example data intake and querysystem in which an embodiment may be implemented;

FIG. 3 is a flow diagram that illustrates how indexers process, index,and store data received from forwarders in accordance with the disclosedembodiments;

FIG. 4 is a flow diagram that illustrates how a search head inconjunction with indexers performs a search query in accordance with thedisclosed embodiments;

FIG. 5 illustrates a block diagram of a system for processing searchrequests that uses extraction rules for field values in accordance withthe disclosed embodiments;

FIG. 6 illustrates an example search query received from a client andexecuted by search peers in accordance with the disclosed embodiments;

FIG. 7A illustrates a search screen in accordance with the disclosedembodiments;

FIG. 7B illustrates a data summary dialog that enables a user to selectvarious data sources in accordance with the disclosed embodiments;

FIG. 8A illustrates a key indicators view in accordance with thedisclosed embodiments;

FIG. 8B illustrates an incident review dashboard in accordance with thedisclosed embodiments;

FIG. 8C illustrates a proactive monitoring tree in accordance with thedisclosed embodiments;

FIG. 8D illustrates a screen displaying both log data and performancedata in accordance with the disclosed embodiments;

FIG. 9 illustrates a block diagram of an example cloud-based data intakeand query system in which an embodiment may be implemented;

FIG. 10 is a flow diagram that illustrates an example process forcalculating a number of events generated by a data intake and querysystem during one or more defined time periods, in accordance with thedisclosed embodiments;

FIG. 11 depicts a screen displaying key indicators and other metricsrelated to a number of events per various time periods in accordancewith the disclosed embodiments;

FIG. 12 depicts a screen displaying input components for specifying oneor more thresholds related to key indicators in accordance with thedisclosed embodiments; and

FIG. 13 is a block diagram of a computer system upon which embodimentsmay be implemented.

DETAILED DESCRIPTION

In the following description, for the purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the present invention. It will be apparent, however,that the present invention may be practiced without these specificdetails. In other instances, well-known structures and devices are shownin block diagram form in order to avoid unnecessarily obscuring thepresent invention.

Embodiments are described herein according to the following outline:

1.0. General Overview

2.0. Operating Environment

2.1. Environment Overview

2.2 Data Intake and Query System Overview

2.3. Data Server System

2.4. Data Ingestion

2.4.1. Input

2.4.2. Parsing

2.4.3. Indexing

2.5. Query Processing

2.6. Field Extraction

2.7. Example Search Screen

2.8. Acceleration Techniques

2.8.1. Map-Reduce Technique

2.8.2. Keyword Index

2.8.3. High Performance Analytics Store

2.8.4. Accelerating Report Generation

2.9. Security Features

2.10. Data Center Monitoring

2.11. Cloud-Based System Overview

2.12. Other Example Search Support Systems

3.0 Functional Overview

3.1. Collecting Device Data

3.2. Measuring Data Ingestion

3.3. Presenting Data Ingestion Information

3.4. Monitoring Data Ingestion

4.0. Implementation Mechanisms—Hardware Overview

5.0. Example Embodiments

6.0. Extensions and Alternatives

1.0. GENERAL OVERVIEW

This overview presents a basic description of some aspects of a possibleembodiment of the present invention. It should be noted that thisoverview is not an extensive or exhaustive summary of aspects of thepossible embodiment. Moreover, it should be noted that this overview isnot intended to be understood as identifying any particularlysignificant aspects or elements of the possible embodiment, nor asdelineating any scope of the possible embodiment in particular, nor theinvention in general. This overview merely presents some concepts thatrelate to the example possible embodiment in a condensed and simplifiedformat, and should be understood as merely a conceptual prelude to amore detailed description of example possible embodiments that followsbelow.

According to various embodiments, systems and techniques are describedfor a data intake and query system to measure an amount of raw dataingested by the data intake and query system during one or more definedperiods of time. As used herein, a data intake and query systemingesting raw data generally refers to the system receiving the raw datafrom one or more computing devices and processing the data for storageand searchability. Processing the raw data may include, for example,parsing the raw data into “events,” where each event includes a portionof the received raw data and is associated with a timestamp. Each eventmay further be associated with additional metadata describing the event,including host information identifying a particular device generatingthe raw data, source information identifying a pathname or other sourceidentifier for the event, and source type information identifying a dataformat associated with the raw data. Processing the raw data may furtherinclude storing the events in one or more indexes that facilitateprocessing of search queries on the events, where each index is a datarepository that stores a particular collection of events.

In one embodiment, measuring an amount of raw data ingested by a dataintake and query system includes calculating a number of eventsgenerated by the system from the raw data during one or more definedtime periods. For example, as a data intake and query system processesraw data received from one or more devices, the system may count anumber of events generated from the raw data and track when each eventis generated. Based on the number of events generated, the system maycalculate various metrics including, but not limited to, a number ofevents generated by the system during a particular day, a number ofevents generated per day over a period of time, a maximum number ofevents generated in a day over a period of time, an average number ofevents generated per day, etc. Although the example metrics are relativeto a time period of a day, any other time period may be used includingseconds, hours, weeks, years, etc.

In one embodiment, the raw data received by a data intake and querysystem may include data produced by computing devices under themanagement of a managed security service provider (MSSP) or other entitythat manages a collection of computing devices. For example, the devicesmay include network devices, servers, and other computing devices thatproduce various types of raw data such as log files, system files,network events, etc. The raw data may be sent to the data intake andquery system for processing, and users may use various interfacesprovided by the system to perform various tasks related to the dataincluding, for example, viewing visualizations indicating informationabout data ingestion rates, searching the data, viewing detectedpossible threats to network security, etc.

In an embodiment, calculating a number of events generated by a dataintake and query system during various time periods may be used for anumber of different purposes. As one example, a calculated number ofevents generated during a particular day may be used as part of avisualization that provides information to a user associated with thedevices producing the raw data. One example visualization may include a“dashboard” interface displaying various metrics such as, for example, amaximum number of events generated per day, a minimum number of eventsgenerated per day, an average number of events generated per day, eventgeneration trend data, etc. Users may use this information to manage andmonitor an amount of raw data ingested by a data intake and querysystem. As another example, a number of events generated by a dataintake and query system during a particular time period may be used tocalculate a fee to charge a user associated with the devices producingthe raw data from which the events are derived. As yet another example,a number of events generated during a particular time period may be usedto generate alerts to inform users of unusual activity related to thenumber of events generated by the system.

Other embodiments include, without limitation, a non-transitorycomputer-readable medium that includes processor-executable instructionsthat enable a processing unit to implement one or more aspects of thedisclosed methods as well as a system configured to implement one ormore aspects of the disclosed methods.

2.0. OPERATING ENVIRONMENT

2.1. Environment Overview

FIG. 1 illustrates a networked computer system 100 in which anembodiment may be implemented. FIG. 1 represents on example embodimentthat is provided for purposes of illustrating a clear example; otherembodiments may use different arrangements.

The networked computer system 100 comprises one or more computingdevices. These one or more computing devices comprise any combination ofhardware and software configured to implement the various logicalcomponents described herein. For example, the one or more computingdevices may include one or more memories storing instructions forimplementing the various components described herein, one or morehardware processors configured to execute the instructions stored in theone or more memories, and various data repositories in the one or morememories for storing data structures utilized and manipulated by thevarious components.

In an embodiment, one or more devices 102 are coupled to a data intakeand query system 106 via one or more networks 104. Networks 104 broadlyrepresent one or more LANs, WANs, cellular networks (e.g., LTE, HSPA,3G, and other cellular technologies), and/or internetworks using any ofwired, wireless, terrestrial microwave, or satellite links, and mayinclude the public Internet. Each device 102 may comprise, for example,one or more of a network device, a web server, an application server, adatabase server, etc. Other examples of devices 102 may include, withoutlimitation, smart phones, tablet computers, other handheld computers,wearable devices, laptop computers, desktop computers, servers, portablemedia players, gaming devices, and so forth.

Each of devices 102 may generate various types of data during operation,including event logs, network data, sensor data, and other types ofmachine-generated data. For example, a device 102 comprising a webserver may generate one or more web server logs in which details ofinteractions between the web server and other devices is recorded. Asanother example, a device 102 comprising a router may generate one ormore router logs that record information related to network trafficmanaged by the router. As yet another example, a device 102 comprisingdatabase server may generate one or more logs that record informationrelated to requests sent from other devices (e.g., other web servers orapplication servers) for data managed by the database server. In anembodiment, data produced by the devices may be sent to a data intakeand query system 106 via the one or more networks 104 for processing, asdescribed in more detail herein after. As yet another example, data mayinclude user-generated data, such as analyst log files including datainput by a user, audit logs, etc.

In an embodiment, one or more devices 102 may belong to a device group(e.g., one of device groups 108A-C). Each device group generally mayrepresent a logical grouping of one or more devices. For example, eachdevice group may correspond to a collection of devices belonging to aparticular entity (e.g., a particular business or other organization),devices that collectively implement a particular function (e.g., acollection of devices implementing a web-based application), or based onany other device characteristics. In one embodiment, a service providermay manage separate device groups on behalf of companies or otherorganizations that own the devices. As one example, service provider110A may represent an MSSP that is responsible for managing twodifferent collections of network devices belonging to two separatecompanies, represented by a device group 108A and a device group 108B.In one embodiment, each device group may correspond to a separate“project” at the data intake and query system, where the system storesand monitors data associated with each project in a logically separatemanner. A service provider 110A may group the devices into separatedevice groups 108A and 108B, for example, to separately store, monitor,and interact with data produced by devices within each group. As anotherexample, a service provider 110A may manage two or more separate devicegroups, where each device in the device groups belongs to the samecompany but may implement different functions. For example, one devicegroup may represent network devices implementing a company's east coastoperations, while a second device group represents network devicesimplementing the same company's west coast operations. Similar to above,a service provider may configure a data intake and query system suchthat each device group corresponds to a separate project that enablesseparate monitoring of data originating from each group.

2.2. Data Intake and Query System Overview

Data intake and query system 106 generally represents a data analysissystem that is configured to consume and analyze machine-generated data,such as performance data that may be generated by one or more devices102. Analyzing massive quantities of machine data, such as performancedata that may be generated by a large number of devices 102, presents anumber of challenges, including ingesting the large quantities of datathat may be generated by devices 102, and storing the data in a mannerthat enables efficient analysis.

In one embodiment, these challenges can be addressed by using anevent-based data intake and query system, such as the SPLUNK® ENTERPRISEsystem produced by Splunk Inc. of San Francisco, Calif. The SPLUNK®ENTERPRISE system is the leading platform for providing real-timeoperational intelligence that enables organizations to collect, index,and search machine-generated data from various websites, applications,servers, networks, and mobile devices that power their businesses. TheSPLUNK® ENTERPRISE system is particularly useful for analyzingunstructured data, which is commonly found in system and application logfiles, network data, and other data input sources. Although many of thetechniques described herein are explained with reference to a dataintake and query system similar to the SPLUNK® ENTERPRISE system, thetechniques are also applicable to other types of data systems.

In the SPLUNK® ENTERPRISE system, machine-generated data is collectedand stored as “events,” where each event comprises a portion of themachine-generated data and is associated with a specific point in time.For example, events may be derived from “time series data,” where thetime series data comprises a sequence of data points (e.g., performancemeasurements from a computer system, etc.) that are associated withsuccessive points in time. In general, each event can be associated witha timestamp that is derived from the raw data in the event, determinedthrough interpolation between temporally proximate events having knowntimestamps, determined based on other configurable rules for assigningtimestamps to events, etc.

Events can be derived from either “structured” or “unstructured” machinedata. In general, structured data has a predefined format, where dataitems with specific data formats are stored at predefined locations inthe data. For example, structured data may include data stored as fieldsin a database table. In contrast, unstructured data may not have apredefined format. This means that unstructured data can comprisevarious data items of different data types and that may be stored atdifferent locations within the data. For example, when the data sourceis an operating system log, an event can include one or more lines fromthe operating system log containing raw data that includes differenttypes of performance and diagnostic information associated with aspecific point in time.

Examples of components which may generate machine data from which eventsmay be derived include, but are not limited to, web servers, applicationservers, databases, firewalls, routers, operating systems, and softwareapplications that execute on computer systems, mobile devices, andsensors. The data generated by such data sources can include, forexample and without limitation, server log files, activity log files,configuration files, messages, network packet data, performancemeasurements, and sensor measurements.

[The SPLUNK® ENTERPRISE system also facilitates using a flexible schemato specify how to extract information from the event data, where theflexible schema may be developed and redefined as needed. Note that aflexible schema may be applied to event data “on the fly,” when it isneeded (e.g., at search time, etc.), rather than at ingestion time ofthe data as in traditional database systems. Because the schema is notapplied to event data until it is needed (e.g., at search time, etc.),it may be referred to as a “late-binding schema.”

During operation, the SPLUNK® ENTERPRISE system starts with raw inputdata (e.g., one or more log files, a stream of network data, sensordata, any data stream, etc.). The system divides this raw data intoblocks, and parses the data to produce timestamped events. The systemstores the timestamped events in one or more data stores, and enablesusers to run queries against the stored data to retrieve events thatmeet criteria specified in a query, such as containing certain keywordsor having specific values in defined fields. In this context, the term“field” refers to a location in the event data containing a value for aspecific data item.

As noted above, the SPLUNK® ENTERPRISE system facilitates using alate-binding schema while performing queries on events. One aspect of alate-binding schema is “extraction rules” that are applied to data inthe events to extract values for specific fields. More specifically, theextraction rules for a field can include one or more instructions thatspecify how to extract a value for the field from the event data. Anextraction rule can generally include any type of instruction forextracting values from data in events. In some cases, an extraction rulecomprises a regular expression, in which case the rule is referred to asa “regex rule.” In the SPLUNK® ENTERPRISE system, a field extractor maybe configured to automatically generate extraction rules for certainfields in the events when the events are being created, indexed, orstored, or possible at a later time. Alternatively, a user may manuallydefine extraction rules for fields using a variety of techniques. Incontrast to a conventional schema for a database system, a late-bindingschema is not defined at data ingestion time. Instead, the late-bindingschema can be developed on an ongoing basis until the time a query isactually executed. This means that extraction rules for the fields in aquery may be provided in the query itself, or may be located duringexecution of the query. Hence, as an analyst learns more about the datain the events, the analyst can continue to refine the late-bindingschema by adding new fields, deleting fields, or modifying the fieldextraction rules for use the next time the schema is used by the system.Because the SPLUNK® ENTERPRISE system maintains the underlying raw dataand provides a late-binding schema for searching the raw data, itenables an analyst to investigate questions that arise as the analystlearns more about the events.

In some embodiments, a common field name may be used to reference two ormore fields containing equivalent data items, even though the fields maybe associated with different types of events that possibly havedifferent data formats and different extraction rules. By enabling acommon field name to be used to identify equivalent fields fromdifferent types of events generated by different data sources, thesystem facilitates use of a “common information model” (CIM) across thedifferent data sources.

2.3. Data Server System

FIG. 2 depicts a block diagram of an example data intake and querysystem 106, similar to the SPLUNK® ENTERPRISE system. System 106includes one or more forwarders 204 that consume data from a variety ofinput data sources 202, and one or more indexers 206 that process andstore the data in one or more data stores 208. These forwarders andindexers can comprise separate computer systems, or may alternativelycomprise separate processes executing on one or more computer systems.

Each data source 202 broadly represents a source of data can be consumedby a system 106. Examples of a data source 202 include, withoutlimitation, data files, directories of files, data sent over a network,event logs, and registries. Each data source 202, for example,

During operation, the forwarders 204 identify which indexers 206 receivedata collected from a data source 202 and forward the data to theappropriate indexers. Forwarders 204 can also perform operations on thedata before forwarding, including removing extraneous data, detectingtimestamps in the data, and/or performing other data transformations.

In an embodiment, a forwarder 204 may comprise a service accessible todevices 102 via a network 104. For example, one type of forwarder 204may be capable of consuming vast amounts of real-time data from apotentially large number of devices 102. The forwarder 204 may, forexample, comprise a computing device which implements multiple datapipelines or “queues” to handle forwarding of network data to indexers206. Techniques for efficiently forwarding data through a data forwarderare described in U.S. Provisional Appl. 62/053,101, entitled “DATAFORWARDING USING MULTIPLE DATA PIPELINES”, filed on 19 Sep. 2014, andwhich is hereby incorporated by reference in its entirety for allpurposes.

2.4. Data Ingestion

FIG. 3 depicts a flow chart illustrating an example data flow within adata intake and query system 106, in accordance with the disclosedembodiments. The data flow illustrated in FIG. 3 is provided forillustrative purposes only; one or more of the steps of the processesillustrated in FIG. 3 may be removed or the ordering of the steps may bechanged. Furthermore, for the purposes of illustrating a clear example,one or more particular system components is described as performingvarious operations during each of the data flow stages. For example, aforwarder is described as receiving and processing data during an inputphase, an indexer is described as parsing and indexing data duringparsing and indexing phases, and a search head is described asperforming a search query during a search phase. However, it is notedthat other system arrangements and distributions of the processing stepsacross system components may be used.

2.4.1. Input

At block 302, a forwarder receives data from an input source. Aforwarder, for example, initially may receive the data as a raw datastream generated by the input source. For example, a forwarder mayreceive a data stream from a log file generated by an applicationserver, from a stream of network data from a network device, or from anyother source of data. In one embodiment, a forwarder receives the rawdata and may segment the data stream into “blocks,” possibly of auniform data size, to facilitate subsequent processing steps.

At block 304, a forwarder or other system component annotates each blockgenerated from the raw data with one or more metadata fields. Thesemetadata fields may, for example, provide information related to thedata block as a whole and which apply to each event that is subsequentlyderived from the data block, as described in more detail below. Forexample, the metadata fields may include separate fields specifying eachof a host, a source, and a source type related to the data block. A hostfield, for example, may contain a value identifying a host name or IPaddress of a device that generated the data. A source field may containa value identifying a source of the data, such as a pathname of a fileor a protocol and port related to received network data. A source typefield may contain a value specifying a particular source type label forthe data. Additional metadata fields may also be included during theinput phase, such as a character encoding of the data if known, andpossibly other values that provide information relevant to laterprocessing steps. In an embodiment, a forwarder forwards the data toanother system component for further processing, typically forwardingthe annotated data blocks to an indexer.

2.4.2. Parsing

At block 306, an indexer receives data blocks from a forwarder andparses the data to organize the data into events. In an embodiment, toorganize the data into events, an indexer may determine a source typeassociated with each data block (e.g., by extracting a source type labelfrom the metadata fields associated with the data block) and refer to asource type configuration corresponding to the identified source type.The source type definition may include one or more properties thatindicate to the indexer what are the boundaries of events in the data.In general, these properties may include regular expression-based rulesor delimiter rules where, for example, event boundaries may be indicatedby predefined characters or character strings. These predefinedcharacters may include punctuation marks or other special charactersincluding, for example, carriage returns, tabs, spaces, or line breaks.If a source type for the data is unknown to the indexer, an indexer mayinfer a source type for the data by examining the structure of the dataand apply an inferred source type definition to the data to create theevents.

At block 308, the indexer determines a timestamp for each event. Similarto the process for creating events, an indexer may again refer to asource type definition associated with the data to locate one or moreproperties that indicate instructions for determining a timestamp foreach event. The properties may, for example, instruct an indexer toextract a time value from a portion of data in the event, to interpolatetime values based on timestamps associated with temporally proximateevents, to create a timestamp based on a time the event data wasreceived or generated, to use the timestamp of a previous event, orbased on any other rules for determining timestamps.

At block 310, the indexer associates with each event one or moremetadata fields including a field containing the timestamp determinedfor the event. These metadata fields may include a number of “defaultfields” that are associated with all events, and may also include onemore custom fields as defined by a user. Similar to the metadata fieldsassociated with the data blocks at block 304, the default metadatafields associated with each event may include a host, source, and sourcetype field in addition to a field storing the timestamp.

At block 312, an indexer may optionally apply one or moretransformations to data included in the events created at block 306. Forexample, such transformations can include removing a portion of an event(e.g., a portion used to define event boundaries, other extraneous text,etc.), masking a portion of an event (e.g., masking a credit cardnumber), or removing redundant portions of an event. The transformationsapplied to event data may, for example, be specified in one or moreconfiguration files and referenced by one or more source typedefinitions.

2.4.3. Indexing

At blocks 314 and 316, an indexer can optionally generate a keywordindex to facilitate fast keyword searching for event data. To build akeyword index, at block 314, the indexer identifies a set of keywords ineach event. At block 316, the indexer includes the identified keywordsin an index, which associates each stored keyword with referencepointers to events containing that keyword (or to locations withinevents where that keyword is located, other location identifiers, etc.).When an indexer subsequently receives a keyword-based query, the indexercan access the keyword index to quickly identify events containing thekeyword.

In some embodiments, the keyword index may include entries forname-value pairs found in events, where a name-value pair can include apair of keywords connected by a symbol, such as an equals sign or colon.In this way, events containing these name-value pairs can be quicklylocated. In some embodiments, fields can automatically be generated forsome or all of the name-value pairs at the time of indexing. Forexample, if the string “dest=10.0.1.2” is found in an event, a fieldnamed “dest” may be created for the event, and assigned a value of“10.0.1.2”.

At block 318, the indexer stores the events in a data store, where atimestamp can be stored with each event to facilitate searching forevents based on a time range. In one embodiment, the stored events areorganized into “buckets,” where each bucket stores events associatedwith a specific time range based on the timestamps associated with eachevent. This may not only improve time-based searching, but also allowfor events with recent timestamps, which may have a higher likelihood ofbeing accessed, to be stored in faster memory to facilitate fasterretrieval. For example, buckets containing the most recent events can bestored as flash memory instead of on hard disk.

Each indexer 206 may be responsible for storing and searching a subsetof the events contained in a corresponding data store 208. Bydistributing events among the indexers and data stores, the indexers cananalyze events for a query in parallel, for example, using map-reducetechniques, wherein each indexer returns partial responses for a subsetof events to a search head that combines the results to produce ananswer for the query. By storing events in buckets for specific timeranges, an indexer may further optimize searching by looking only inbuckets for time ranges that are relevant to a query.

Moreover, events and buckets can also be replicated across differentindexers and data stores to facilitate high availability and disasterrecovery as is described in U.S. patent application Ser. No. 14/266,812,filed on 30 Apr. 2014, and in U.S. patent application Ser. No.14/266,817, also filed on 30 Apr. 2014, each of which is herebyincorporated by reference in its entirety for all purposes.

2.5. Query Processing

FIG. 4 is a flow diagram that illustrates an example process that asearch head and one or more indexers may perform during a search query.At block 402, a search head receives a search query from a client. Atblock 404, the search head analyzes the search query to determine whatportions can be delegated to indexers and what portions can be executedlocally by the search head. At block 406, the search head distributesthe determined portions of the query to the appropriate indexers.

At block 408, the indexers to which the query was distributed searchtheir data stores for events that are responsive to the query. Todetermine which events are responsive to the query, the indexer searchesfor events that match the criteria specified in the query. This criteriacan include matching keywords or specific values for certain fields. Insearches that use a late-binding schema, the searching operations atblock 408 may involve using the late-binding schema to extract valuesfor specified fields from events at the time the query is processed. Inan embodiment, one or more rules for extracting field values may bespecified as part of a source type definition. The indexers may theneither send the relevant events back to the search head, or use theevents to calculate a partial result, and send the partial result backto the search head.

At block 410, the search head combines the partial results and/or eventsreceived from the indexers to produce a result for the query. Thisresult may comprise different types of data depending on what the queryrequested. For example, the results can include a listing of matchingevents returned by the query, or some type of visualization of the datafrom the returned events. In another example, the final result caninclude one or more calculated values derived from the matching events.

The results generated by the system 106 can be returned to a clientusing different techniques. For example, one technique streams resultsback to a client in real-time as they are identified. Another techniquewaits to report the results to the client until a complete set ofresults is ready to return to the client. Yet another technique streamsinterim results back to the client in real-time until a complete set ofresults is ready, and then returns the complete set of results to theclient. In another technique, certain results are stored as “searchjobs” and the client may retrieve the results by referring the searchjobs.

The search head can also perform various operations to make the searchmore efficient. For example, before the search head begins execution ofa query, the search head can determine a time range for the query and aset of common keywords that all matching events include. The search headmay then use these parameters to query the indexers to obtain a supersetof the eventual results. Then, during a filtering stage, the search headcan perform field-extraction operations on the superset to produce areduced set of search results.

2.6. Field Extraction

FIG. 5 illustrates an example of applying extraction rules to a searchquery received from a client. At the start of the process, a searchquery 502 is received at a query processor 504. Query processor 504includes various mechanisms for processing a query and may reside in asearch head 210 and/or an indexer 206. Note that the example searchquery 502 illustrated in FIG. 5 is expressed in Search ProcessingLanguage (SPL), which is used in conjunction with the SPLUNK® ENTERPRISEsystem. SPL is a pipelined search language in which a set of inputs isoperated on by a first command in a command line, and then a subsequentcommand following the pipe symbol “I” operates on the results producedby the first command, and so on for additional commands. Search query502 can also be expressed in other query languages, such as theStructured Query Language (“SQL”) or any other query language.

In response to receiving search query 502, query processor 504determines that search query 502 refers to two fields: “IP” and“target.” Query processor 504 also determines that the values for the“IP” and “target” fields have not already been extracted from eventsstored in a data store 514, and consequently determines that queryprocessor 504 can use extraction rules to extract values for the fields.Hence, query processor 504 performs a lookup for the extraction rules ina rule base 506. For example, rule base 506 may include a source typedefinition, where the source type definition includes extraction rulesfor various different source types. The query processor 504 obtainsextraction rules 508-509, wherein extraction rule 508 specifies how toextract a value for the “IP” field from an event, and extraction rule509 specifies how to extract a value for the “target” field from anevent. As is illustrated in FIG. 5, extraction rules 508-509 cancomprise regular expressions that specify how to extract values for therelevant fields. Such regular expression-based extraction rules are alsoreferred to as “regex rules.” In addition to specifying how to extractfield values, the extraction rules may also include instructions forderiving a field value by performing a function on a character string orvalue retrieved by the extraction rule. For example, a transformationrule may truncate a character string, or convert the character stringinto a different data format. In some cases, the query itself canspecify one or more extraction rules.

Next, query processor 504 sends extraction rules 508-509 to a fieldextractor 512, which applies extraction rules 508-509 to events 516-518in a data store 514. Note that data store 514 can include one or moredata stores, and extraction rules 508-509 can be applied to largenumbers of events in data store 514, and are not meant to be limited tothe three events 516-517 illustrated in FIG. 5. Moreover, the queryprocessor 514 can instruct field extractor 512 to apply the extractionrules to all the events in a data store 514, or to a subset of theevents that have been filtered based on some criteria.

Next, field extractor 512 applies extraction rule 508 for the firstcommand “Search IP=“10*” to events in data store 514 including events516-518. Extraction rule 508 is used to extract values for the IPaddress field from events in data store 514 by looking for a pattern ofone or more digits, followed by a period, followed again by one or moredigits, followed by another period, followed again by one or moredigitals, followed by another period, and followed again by one or moredigits. Next, field extractor 512 returns field values 520 to queryprocessor 504, which uses the criterion IP=“10*” to look for IPaddresses that start with “10”. Note that events 516 and 517 match thiscriterion, but event 518 does not, so the result set for the firstcommand includes events 516-517.

Query processor 504 then sends events 516-517 to the next command “statscount target.” To process this command, query processor 504 causes fieldextractor 512 to apply extraction rule 509 to events 516-517. Extractionrule 509 is used to extract values for the target field for events516-517 by skipping the first four commas in events 516-517, and thenextracting all of the following characters until a comma or period isreached. Next, field extractor 512 returns field values 521 to queryprocessor 504, which executes the command “stats count target” to countthe number of unique values contained in the target fields, which inthis example produces the value “2” that is returned as a final result522 for the query.

Note that query results can be returned to a client, a search head, orany other system component for further processing. In general, queryresults may include a set of one or more events, a set of one or morevalues obtained from the events, a subset of the values, statisticscalculated based on the values, a report containing the values, or avisualization, such as a graph or chart, generated from the values.

2.7. Example Search Screen

FIG. 7A illustrates an example search screen 700 in accordance with thedisclosed embodiments. Search screen 700 includes a search bar 702 thataccepts user input in the form of a search string. It also includes atime range picker 712 that enables the user to specify a time range forthe search. For “historical searches” the user can select a specifictime range, or alternatively a relative time range, such as “today,”“yesterday” or “last week.” For “real-time searches,” the user canselect the size of a preceding time window to search for real-timeevents. Search screen 700 also initially displays a “data summary”dialog as is illustrated in FIG. 7B that enables the user to selectdifferent sources for the event data, for example by selecting specifichosts and log files.

After the search is executed, the search screen 700 can display theresults through search results tabs 704, wherein search results tabs 704includes: an “events tab” that displays various information about eventsreturned by the search; a “statistics tab” that displays statisticsabout the search results; and a “visualization tab” that displaysvarious visualizations of the search results. The events tab illustratedin FIG. 7A displays a timeline graph 705 that graphically illustratesthe number of events that occurred in one-hour intervals over theselected time range. It also displays an events list 708 that enables auser to view the raw data in each of the returned events. Itadditionally displays a fields sidebar 706 that includes statisticsabout occurrences of specific fields in the returned events, including“selected fields” that are pre-selected by the user, and “interestingfields” that are automatically selected by the system based onpre-specified criteria.

2.8. Acceleration Technique

The above-described system provides significant flexibility by enablinga user to analyze massive quantities of minimally processed performancedata “on the fly” at search time instead of storing pre-specifiedportions of the performance data in a database at ingestion time. Thisflexibility enables a user to see correlations in the performance dataand perform subsequent queries to examine interesting aspects of theperformance data that may not have been apparent at ingestion time.

However, performing extraction and analysis operations at search timecan involve a large amount of data and require a large number ofcomputational operations, which can cause considerable delays whileprocessing the queries. Fortunately, a number of acceleration techniqueshave been developed to speed up analysis operations performed at searchtime. These techniques include: (1) performing search operations inparallel by formulating a search as a map-reduce computation; (2) usinga keyword index; (3) using a high performance analytics store; and (4)accelerating the process of generating reports. These techniques aredescribed in more detail below.

2.8.1. Map-Reduce Technique

To facilitate faster query processing, a query can be structured as amap-reduce computation, wherein the “map” operations are delegated tothe indexers, while the corresponding “reduce” operations are performedlocally at the search head. For example, FIG. 6 illustrates how a searchquery 602 received from a client at a search head 210 can split into twophases, including: (1) a “map phase” comprising subtasks 604 (e.g., dataretrieval or simple filtering) that may be performed in parallel and are“mapped” to indexers 206 for execution, and (2) a “reduce phase”comprising a merging operation 606 to be executed by the search headwhen the results are ultimately collected from the indexers.

During operation, upon receiving search query 602, a search head 210modifies search query 602 by substituting “stats” with “prestats” toproduce search query 604, and then distributes search query 604 to oneor more distributed indexers, which are also referred to as “searchpeers.” Note that search queries may generally specify search criteriaor operations to be performed on events that meet the search criteria.Search queries may also specify field names, as well as search criteriafor the values in the fields or operations to be performed on the valuesin the fields. Moreover, the search head may distribute the full searchquery to the search peers as is illustrated in FIG. 4, or mayalternatively distribute a modified version (e.g., a more restrictedversion) of the search query to the search peers. In this example, theindexers are responsible for producing the results and sending them tothe search head. After the indexers return the results to the searchhead, the search head performs the merging operations 606 on theresults. Note that by executing the computation in this way, the systemeffectively distributes the computational operations while minimizingdata transfers.

2.8.2. Keyword Index

As described above with reference to the flow charts in FIG. 3 and FIG.4, data intake and query system 106 can construct and maintain one ormore keyword indices to facilitate rapidly identifying events containingspecific keywords. This can greatly speed up the processing of queriesinvolving specific keywords. As mentioned above, to build a keywordindex, an indexer first identifies a set of keywords. Then, the indexerincludes the identified keywords in an index, which associates eachstored keyword with references to events containing that keyword, or tolocations within events where that keyword is located. When an indexersubsequently receives a keyword-based query, the indexer can access thekeyword index to quickly identify events containing the keyword.

2.8.3. High Performance Analytics Store

To speed up certain types of queries, some embodiments of system 106make use of a high performance analytics store, which is referred to asa “summarization table,” that contains entries for specific field-valuepairs. Each of these entries keeps track of instances of a specificvalue in a specific field in the event data and includes references toevents containing the specific value in the specific field. For example,an example entry in a summarization table can keep track of occurrencesof the value “94107” in a “ZIP code” field of a set of events, whereinthe entry includes references to all of the events that contain thevalue “94107” in the ZIP code field. This enables the system to quicklyprocess queries that seek to determine how many events have a particularvalue for a particular field, because the system can examine the entryin the summarization table to count instances of the specific value inthe field without having to go through the individual events or doextractions at search time. Also, if the system needs to process allevents that have a specific field-value combination, the system can usethe references in the summarization table entry to directly access theevents to extract further information without having to search all ofthe events to find the specific field-value combination at search time.

In some embodiments, the system maintains a separate summarization tablefor each of the above-described time-specific buckets that stores eventsfor a specific time range, wherein a bucket-specific summarization tableincludes entries for specific field-value combinations that occur inevents in the specific bucket. Alternatively, the system can maintain aseparate summarization table for each indexer, wherein theindexer-specific summarization table only includes entries for theevents in a data store that is managed by the specific indexer.

The summarization table can be populated by running a “collection query”that scans a set of events to find instances of a specific field-valuecombination, or alternatively instances of all field-value combinationsfor a specific field. A collection query can be initiated by a user, orcan be scheduled to occur automatically at specific time intervals. Acollection query can also be automatically launched in response to aquery that asks for a specific field-value combination.

In some cases, the summarization tables may not cover all of the eventsthat are relevant to a query. In this case, the system can use thesummarization tables to obtain partial results for the events that arecovered by summarization tables, but may also have to search throughother events that are not covered by the summarization tables to produceadditional results. These additional results can then be combined withthe partial results to produce a final set of results for the query.This summarization table and associated techniques are described in moredetail in U.S. Pat. No. 8,682,925, issued on Mar. 25, 2014.

2.8.4. Accelerating Report Generation

In some embodiments, a data server system such as the SPLUNK® ENTERPRISEsystem can accelerate the process of periodically generating updatedreports based on query results. To accelerate this process, asummarization engine automatically examines the query to determinewhether generation of updated reports can be accelerated by creatingintermediate summaries. (This is possible if results from preceding timeperiods can be computed separately and combined to generate an updatedreport. In some cases, it is not possible to combine such incrementalresults, for example where a value in the report depends onrelationships between events from different time periods.) If reportscan be accelerated, the summarization engine periodically generates asummary covering data obtained during a latest non-overlapping timeperiod. For example, where the query seeks events meeting a specifiedcriteria, a summary for the time period includes only events within thetime period that meet the specified criteria. Similarly, if the queryseeks statistics calculated from the events, such as the number ofevents that match the specified criteria, then the summary for the timeperiod includes the number of events in the period that match thespecified criteria.

In parallel with the creation of the summaries, the summarization engineschedules the periodic updating of the report associated with the query.During each scheduled report update, the query engine determines whetherintermediate summaries have been generated covering portions of the timeperiod covered by the report update. If so, then the report is generatedbased on the information contained in the summaries. Also, if additionalevent data has been received and has not yet been summarized, and isrequired to generate the complete report, the query can be run on thisadditional event data. Then, the results returned by this query on theadditional event data, along with the partial results obtained from theintermediate summaries, can be combined to generate the updated report.This process is repeated each time the report is updated. Alternatively,if the system stores events in buckets covering specific time ranges,then the summaries can be generated on a bucket-by-bucket basis. Notethat producing intermediate summaries can save the work involved inre-running the query for previous time periods, so only the newer eventdata needs to be processed while generating an updated report. Thesereport acceleration techniques are described in more detail in U.S. Pat.No. 8,589,403, issued on 19 Nov. 2013, and U.S. Pat. No. 8,412,696,issued on 2 Apr. 2011.

2.9. Security Features

The SPLUNK® ENTERPRISE platform provides various schemas, dashboards andvisualizations that make it easy for developers to create applicationsto provide additional capabilities. One such application is the SPLUNK®APP FOR ENTERPRISE SECURITY, which performs monitoring and alertingoperations and includes analytics to facilitate identifying both knownand unknown security threats based on large volumes of data stored bythe SPLUNK® ENTERPRISE system. This differs significantly fromconventional Security Information and Event Management (STEM) systemsthat lack the infrastructure to effectively store and analyze largevolumes of security-related event data. Traditional STEM systemstypically use fixed schemas to extract data from pre-definedsecurity-related fields at data ingestion time, wherein the extracteddata is typically stored in a relational database. This data extractionprocess (and associated reduction in data size) that occurs at dataingestion time inevitably hampers future incident investigations, whenall of the original data may be needed to determine the root cause of asecurity issue, or to detect the tiny fingerprints of an impendingsecurity threat.

In contrast, the SPLUNK® APP FOR ENTERPRISE SECURITY system stores largevolumes of minimally processed security-related data at ingestion timefor later retrieval and analysis at search time when a live securitythreat is being investigated. To facilitate this data retrieval process,the SPLUNK® APP FOR ENTERPRISE SECURITY provides pre-specified schemasfor extracting relevant values from the different types ofsecurity-related event data, and also enables a user to define suchschemas.

The SPLUNK® APP FOR ENTERPRISE SECURITY can process many types ofsecurity-related information. In general, this security-relatedinformation can include any information that can be used to identifysecurity threats. For example, the security-related information caninclude network-related information, such as IP addresses, domain names,asset identifiers, network traffic volume, uniform resource locatorstrings, and source addresses. (The process of detecting securitythreats for network-related information is further described in U.S.patent application Ser. Nos. 13/956,252, and 13/956,262.)Security-related information can also include endpoint information, suchas malware infection data and system configuration information, as wellas access control information, such as login/logout information andaccess failure notifications. The security-related information canoriginate from various sources within a data center, such as hosts,virtual machines, storage devices and sensors. The security-relatedinformation can also originate from various sources in a network, suchas routers, switches, email servers, proxy servers, gateways, firewallsand intrusion-detection systems.

During operation, the SPLUNK® APP FOR ENTERPRISE SECURITY facilitatesdetecting so-called “notable events” that are likely to indicate asecurity threat. These notable events can be detected in a number ofways: (1) an analyst can notice a correlation in the data and canmanually identify a corresponding group of one or more events as“notable;” or (2) an analyst can define a “correlation search”specifying criteria for a notable event, and every time one or moreevents satisfy the criteria, the application can indicate that the oneor more events are notable. An analyst can alternatively select apre-defined correlation search provided by the application. Note thatcorrelation searches can be run continuously or at regular intervals(e.g., every hour) to search for notable events. Upon detection, notableevents can be stored in a dedicated “notable events index,” which can besubsequently accessed to generate various visualizations containingsecurity-related information. Also, alerts can be generated to notifysystem operators when important notable events are discovered.

The SPLUNK® APP FOR ENTERPRISE SECURITY provides various visualizationsto aid in discovering security threats, such as a “key indicators view”that enables a user to view security metrics of interest, such as countsof different types of notable events. For example, FIG. 8A illustratesan example key indicators view 800 that comprises a dashboard, which candisplay a value 801, for various security-related metrics, such asmalware infections 802. It can also display a change in a metric value803, which indicates that the number of malware infections increased by63 during the preceding interval. Key indicators view 800 additionallydisplays a histogram panel 804 that displays a histogram of notableevents organized by urgency values, and a histogram of notable eventsorganized by time intervals. This key indicators view is described infurther detail in pending U.S. patent application Ser. No. 13/956,338filed Jul. 31, 2013.

These visualizations can also include an “incident review dashboard”that enables a user to view and act on “notable events.” These notableevents can include: (1) a single event of high importance, such as anyactivity from a known web attacker; or (2) multiple events thatcollectively warrant review, such as a large number of authenticationfailures on a host followed by a successful authentication. For example,FIG. 8B illustrates an example incident review dashboard 810 thatincludes a set of incident attribute fields 811 that, for example,enables a user to specify a time range field 812 for the displayedevents. It also includes a timeline 813 that graphically illustrates thenumber of incidents that occurred in one-hour time intervals over theselected time range. It additionally displays an events list 814 thatenables a user to view a list of all of the notable events that matchthe criteria in the incident attributes fields 811. To facilitateidentifying patterns among the notable events, each notable event can beassociated with an urgency value (e.g., low, medium, high, critical),which is indicated in the incident review dashboard. The urgency valuefor a detected event can be determined based on the severity of theevent and the priority of the system component associated with theevent. The incident review dashboard is described further in“http://docs.splunk.com/Documentation/PCI/2.1.1/User/IncidentReviewdashboard.”

In one embodiment, users may be provided access to an enterprisesecurity application such as the SPLUNK® APP FOR ENTERPRISE SECURITYbased on any of a number of different subscription plans. As oneexample, users may pay a flat fee or purchase a subscription thatenables a user to store an amount of data at the data intake and querysystem up to a data volume limit. If devices associated with aparticular user account send an amount of data that exceeds anassociated data volume limit, for example, the user may then be promptedto purchase additional storage space, the system may automaticallyincrease the user's available storage space at an additional cost to theuser, or perform other actions. Elastically scaling data storageavailable to users of a data intake and query system is described inU.S. application Ser. No. 13/572,434 filed on 10 Aug. 2012, and which ishereby incorporated by reference in its entirety for all purposes.

2.10. Data Center Monitoring

As mentioned above, the SPLUNK® ENTERPRISE platform provides variousfeatures that make it easy for developers to create variousapplications. One such application is the SPLUNK® APP FOR VMWARE®, whichperforms monitoring operations and includes analytics to facilitatediagnosing the root cause of performance problems in a data center basedon large volumes of data stored by the SPLUNK® ENTERPRISE system.

This differs from conventional data-center-monitoring systems that lackthe infrastructure to effectively store and analyze large volumes ofperformance information and log data obtained from the data center. Inconventional data-center-monitoring systems, this performance data istypically pre-processed prior to being stored, for example by extractingpre-specified data items from the performance data and storing them in adatabase to facilitate subsequent retrieval and analysis at search time.However, the rest of the performance data is not saved and isessentially discarded during pre-processing. In contrast, the SPLUNK®APP FOR VMWARE® stores large volumes of minimally processed performanceinformation and log data at ingestion time for later retrieval andanalysis at search time when a live performance issue is beinginvestigated.

The SPLUNK® APP FOR VMWARE® can process many types ofperformance-related information. In general, this performance-relatedinformation can include any type of performance-related data and logdata produced by virtual machines and host computer systems in a datacenter. In addition to data obtained from various log files, thisperformance-related information can include values for performancemetrics obtained through an application programming interface (API)provided as part of the vSphere Hypervisor™ system distributed byVMware, Inc. of Palo Alto, Calif. For example, these performance metricscan include: (1) CPU-related performance metrics; (2) disk-relatedperformance metrics; (3) memory-related performance metrics; (4)network-related performance metrics; (5) energy-usage statistics; (6)data-traffic-related performance metrics; (7) overall systemavailability performance metrics; (8) cluster-related performancemetrics; and (9) virtual machine performance statistics. For moredetails about such performance metrics, please see U.S. patent Ser. No.14/167,316 filed 29 Jan. 2014, which is hereby incorporated herein byreference. Also, see “vSphere Monitoring and Performance,” Update 1,vSphere 5.5, EN-001357-00,http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esx-i-vcenter-server-551-monitoring-performance-guide.pdf.

To facilitate retrieving information of interest from performance dataand log files, the SPLUNK® APP FOR VMWARE® provides pre-specifiedschemas for extracting relevant values from different types ofperformance-related event data, and also enables a user to define suchschemas.

The SPLUNK® APP FOR VMWARE® additionally provides various visualizationsto facilitate detecting and diagnosing the root cause of performanceproblems. For example, one such visualization is a “proactive monitoringtree” that enables a user to easily view and understand relationshipsamong various factors that affect the performance of a hierarchicallystructured computing system. This proactive monitoring tree enables auser to easily navigate the hierarchy by selectively expanding nodesrepresenting various entities (e.g., virtual centers or computingclusters) to view performance information for lower-level nodesassociated with lower-level entities (e.g., virtual machines or hostsystems). Example node-expansion operations are illustrated in FIG. 8C,wherein nodes 833 and 834 are selectively expanded. Note that nodes831-839 can be displayed using different patterns or colors to representdifferent performance states, such as a critical state, a warning state,a normal state or an unknown/offline state. The ease of navigationprovided by selective expansion in combination with the associatedperformance-state information enables a user to quickly diagnose theroot cause of a performance problem. The proactive monitoring tree isdescribed in further detail in U.S. patent application Ser. No.14/235,490 filed on 15 Apr. 2014, which is hereby incorporated herein byreference for all possible purposes.

The SPLUNK® APP FOR VMWARE® also provides a user interface that enablesa user to select a specific time range and then view heterogeneous data,comprising events, log data and associated performance metrics, for theselected time range. For example, the screen illustrated in FIG. 8Ddisplays a listing of recent “tasks and events” and a listing of recent“log entries” for a selected time range above a performance-metric graphfor “average CPU core utilization” for the selected time range. Notethat a user is able to operate pull-down menus 842 to selectivelydisplay different performance metric graphs for the selected time range.This enables the user to correlate trends in the performance-metricgraph with corresponding event and log data to quickly determine theroot cause of a performance problem. This user interface is described inmore detail in U.S. patent application Ser. No. 14/167,316 filed on 29Jan. 2014, which is hereby incorporated herein by reference for allpossible purposes.

2.11. Cloud-Based System Overview

The example data intake and query system 106 described in reference toFIG. 2 comprises several system components, including one or moreforwarders, indexers, and search heads. In some environments, a user ofa data intake and query system 106 may install and configure, oncomputing devices owned and operated by the user, one or more softwareapplications that implement some or all of these system components. Forexample, a user may install a software application on server computersowned by the user and configure each server to operate as one or more ofa forwarder, an indexer, a search head, etc. This arrangement generallymay be referred to as an “on-premises” solution, meaning the system 106is installed and operates on computing devices directly controlled bythe user of the system. Some users may prefer an on-premises solutionsince it may provide a greater level of control over the configurationof certain aspects of the system. However, other users may insteadprefer an arrangement in which the user is not directly responsible forproviding and managing the computing devices upon which variouscomponents of system 106 operate.

In one embodiment, to provide an alternative to an entirely on-premisesenvironment for system 106, one or more of the components of a dataintake and query system instead may be provided as a cloud-basedservice. In this context, a cloud-based service refers to a servicehosted by one more computing resources that are accessible to end usersover a network, for example, by using a web browser or other applicationon a client device to interface with the remote computing resources. Forexample, a service provider may provide a cloud-based data intake andquery system by managing computing resources configured to implementvarious aspects of the system (e.g., forwarders, indexers, search heads,etc.) and providing access to the system to end users via a network.Typically, a user may pay a subscription or other fee to use such aservice, and each subscribing user to the cloud-based service may beprovided with an account that enables the user to configure a customizedcloud-based system based on the user's preferences.

FIG. 9 illustrates a block diagram of an example cloud-based data intakeand query system. Similar to the system of FIG. 2, the networkedcomputer system 900 includes input data sources 202 and forwarders 204.In the example system 900 of FIG. 9, one or more forwarders 204 andclient devices 902 are coupled to a cloud-based data intake and querysystem 906 via one or more networks 904. Network 904 broadly representsone or more LANs, WANs, cellular networks, and/or internetworks usingany of wired, wireless, terrestrial microwave, satellite links, etc.,and may include the public Internet, and is used by client devices 902and forwarders 204 to access the system 906. Similar to the system of106, each of the forwarders 204 may be configured to receive data froman input source and to forward the data to other components of thesystem 906 for further processing.

In an embodiment, a cloud-based data intake and query system 906 maycomprise a plurality of system instances 908. In general, each systeminstance 908 may include one or more computing resources managed by aprovider of the cloud-based system 906 made available to a particularsubscriber. The computing resources comprising a system instance 908may, for example, include one or more servers or other devicesconfigured to implement one or more forwarders, indexers, search heads,and other components of a data intake and query system, similar tosystem 106. As indicated above, a subscriber may use a web browser orother application of a client device 902 to access a web portal or otherinterface that enables the subscriber to configure an instance 908.

Providing a data intake and query system as described in reference tosystem 106 as a cloud-based service presents a number of challenges.Each of the components of a system 106 (e.g., forwarders, indexers andsearch heads) may at times refer to various configuration files storedlocally at each component. These configuration files typically mayinvolve some level of user configuration to accommodate particular typesof data a user desires to analyze and to account for other userpreferences. However, in a cloud-based service context, users typicallymay not have direct access to the underlying computing resourcesimplementing the various system components (e.g., the computingresources comprising each system instance 908). Thus may desire to makesuch configurations indirectly, for example, using one or more web-basedinterfaces. Thus, the techniques and systems described herein forproviding user interfaces that enable a user to configure source typedefinitions are applicable to both on-premises and cloud-based servicecontexts, or some combination thereof.

2.12. Other Example Search Support Systems

In general, a search support system may be any system that enables themanagement, storage, and retrieval of data. The example operatingenvironment described above illustrates an example search support system(e.g., SPLUNK® ENTERPRISE) that operates on semi-structured orcompletely unstructured data and also provides a late-binding schema,which imposes structure on the data at query time rather than at storageor ingestion time. Other example search support systems that are capableof operating on semi-structured and unstructured data include Hadoop,Cassandra, and MongoDB.

The Hadoop data system, for example, is a framework for distributedstorage and processing of large data sets. A Hadoop data systemdistributes storage of data (e.g., application log files, network data,etc.) by splitting the data into blocks and storing the blocks amongstnodes of a cluster. To search or otherwise process the data stored bythe distributed set of nodes, instructions are sent to nodes causing thenodes to process locally stored data in parallel with other nodes. Forexample, the instructions may implement a MapReduce processing modelthat distributes and parallelizes processing of the data across thecluster nodes. In a Hadoop environment, unstructured data may bereceived and stored by a cluster of nodes without parsing the data intoevents before the data is stored, as in other data intake examplesdescribed herein. Instead, the system may generate events from thestored raw data and determine which events are responsive to a searchquery at search time using distributed event parsing and searchingprocesses that are executed by the nodes in parallel. The Cassandra andMongoDB database systems similarly represent storage solutions in whichraw data may be stored without event parsing prior to storage. Similarto a Hadoop system, events may be generated from data stored in aCassandra or MongoDB system at search time rather than when the raw datais received.

3.0. FUNCTIONAL OVERVIEW

The arrangement of FIG. 1 may implement a system that enables a dataintake and query system to receive raw data from one or more devices(e.g., devices 102), to ingest the raw data to generate events, tocalculate a number of events generated over one or more defined timeperiods, and to generate various visualizations enabling users to viewvarious metrics related to the data ingestion, among other features. Thecalculation of a number of events generated from raw data ingested by adata intake and query system may be relative to a defined set ofdevices, for example, a collection of devices associated with one ormore particular user accounts. Users associated with the one or moreparticular user accounts may use the various visualizations to monitor arate at which the data intake and query system is producing events fromraw data sent from devices associated with the users.

In one embodiment, a data intake and query system may charge a fee tousers of the system based on a rate at which the system generates eventsfrom data originating from devices associated with a user. For example,a calculation of a number of events generated by the system during aparticular time period (e.g., a day, week, month, etc.) may be used todetermine a fee to charge the user for use of the system. As describedin more detail hereinafter, other calculations based on a number ofevents generated by the system may be used to determine a fee, such asan average number of events generated over a period of time, a peaknumber of events generated over a period of time, based on eventgeneration tiers, and others.

FIG. 10 is a flow diagram that illustrates an example process forcalculating a number of events generated by a data intake and querysystem during one or more defined time periods.

3.1. Collecting Device Data

At block 1002, a data intake and query system receives raw data from oneor more devices. For example, one or more devices 102 may produce theraw data during operation of the devices and send the data to a dataintake and query system 106 via one or more networks 104. The devices102 may send the data directly to a data intake and query system 106, ora service provider (e.g., service provider 108A or 108B) may collect thedata from the devices 102 for which it is responsible and subsequentlysend the data to the system 106. The data generally may include logfiles, output files, network data, or any other machine data that may begenerated by the devices. The data may also include user-generated data,such as analyst logs, audit logs, etc.

At block 1004, a data intake and query system parses the raw data toorganize the data into a plurality of events. In an embodiment, a dataintake and query system 106 parses raw data into a plurality of eventsin general by determining the boundaries of events in the raw data. Oneor more rules for determining event boundaries for a particular type ofraw data may be specified, for example, in a source type configurationfor the particular data type. Each event of the plurality of eventsincludes a portion of the raw data as defined by the determinedboundaries for the event.

At block 1006, a data intake and query system parsing raw data into aplurality of events further comprises determining a timestamp for eachevent. The data intake and query system 106 may again refer to a sourcetype definition associated with the raw data that specifies instructionsfor determining a timestamp for each event. For example, theinstructions may indicate rules for extracting a time value from the rawdata, to use a timestamp of a previous event, to create a timestampbased on a time the event data was received or generated, or based onany other rules.

In one embodiment, each of the plurality of events may be stored in anindex of one or more indexes. The selection of a particular index tostore each event may be based on any number of factors, including a useraccount associated with the events, a particular device from which thedata was received, a type of data received, a project associated with adevice from which the data was received, whether one or more ingestionthresholds have been exceeded, etc. As another example, one or more ofthe events may be stored in a database, a flat file, or using any otherdata persistence mechanism.

As used herein, generation of an event may generally refer to a processas described above in reference to blocks 1004-1006, including parsingraw data to identify the event data, determining a timestamp for theevent, and storing the event in an index. Generation of an event mayalso refer to only some portion of the process, such as parsing anddetermining an event timestamp, or storing the event in an index.

3.2. Measuring Data Ingestion

At block 1008, a data intake and query system calculates a number ofevents generated during one or more defined time periods. A data intakeand query system 106 may calculate a number of events, for example, inresponse to a request to generate an interface displaying various dataingestion metrics, to calculate a fee based on a number of eventsgenerated, or for other purposes.

In general, a data intake and query system 106 may count the occurrenceof an event generation at any point during the process of collecting,parsing, storing, etc., raw data, as described above in reference toblocks 1002-1006. For example, a data intake and query system 106 maydetermine that an event is generated each time the system parses aseparate event from raw data, each time a timestamp is associated withan event, or each time an event is stored in an index or in other datastore.

In one embodiment, a number of events stored in an index or other datastore may be counted by querying one or more relevant indexes for acount of events stored in the indexes. For example, each of the eventsstored in an index may be associated with a time value that indicates attime at which the event was stored in the index. If a data intake andquery system 106 receives a request to determine a number of eventsstored in one or more indexes on a particular day, the system may querythe indexes for a count of events associated with a stored time valuethat is within the specified time period.

In an embodiment, a data intake and query system 106 may determine anumber of events by counting the events as the events are stored in anindex or other data store. For example, each time an event generatedfrom raw data received by the system is stored in a particular index,the system may increment one or more event counters for a user accountassociated with the devices from which the raw data originated. An eventcounter may, for example, track a total number of events stored in eachindex per day or per another period of time, where the total count ofevents stored at the end of the time period is saved and the counter maybe reset. If a request for a number of events generated on a particularday is then received, for example, a data intake and query system 106may then retrieve the counter value stored for the particular day. Otheraggregate values may be calculated (e.g., an average number of eventsgenerated per day, a maximum number of events generated per day, etc.)by retrieving the counter values stored for multiple days.

In an embodiment, a data intake and query system determining a number ofevents generated during a particular time period may include determininga number of events associated with a particular user account and/orproject. For example, a service provider 110A may be associated with oneor more user accounts, and the user accounts may be associated with oneor more of the devices 102 managed by the service provider 110A. Inresponse to a request from a user account associated with serviceprovider 110A for display of data ingestion information, for example,the data intake and query system 106 may determine a number of eventsgenerated during a particular time period and that are associated withthat particular user account.

Furthermore, a service provider 110A or other user may configure one ormore projects to logically separate data ingested by the data intake andquery system 106 for particular sets of devices 102 managed by the user.For example, a user may desire to create one project for device group108A, and a second project for device group 108B. As another example, auser may desire to create separate projects for particular types of dataingested, types of applications hosted by particular devices, etc. Asdescribed in more detail hereinafter, the calculation of a number ofevents generated for a particular user account and/or project may enableuser account and/or project specific presentation of data ingestionmetrics, calculation of fees, etc.

In an embodiment, if a data intake and query system 106 stores oroperates on raw data without deriving events from the data before thedata is stored, the system may determine a number of events associatedwith various defined time periods by periodically analyzing the storeddata. For example, if the raw data is stored in a Hadoop compatible filesystem or other similar storage system, the system may determine anumber of events associated with a defined time period by derivingevents from stored raw data received during the defined time period. Forexample, the data intake and query system 106 may analyze the storeddata to determine a number of events associated with various timeperiods once an hour, once a day, etc. As another example, the dataintake and system 106 may determine a number of events associated withstored raw data upon request, for example, in response to a request togenerate one or more data ingestion metrics, in response to a searchrequest, or in response to a request to calculate a fee based on anumber of stored events.

3.3. Presenting Data Ingestion Information

According to embodiments described herein, a data intake and querysystem 106 provides various visualizations that display metrics andother information related to a rate at which data is ingested by thesystem. In one embodiment, the data intake and query system 106 providesone or more interfaces that display one or more values, charts, graphs,and other visual indications of a number of events generated by the dataintake and query system during one or more defined periods of time. Forexample, one interface may display, among other information, values thatindicate, for a particular user account and/or project, a maximum,minimum, and average number of events generated by the data intake andquery system per day originating from devices associated with the useraccount and/or project.

Referring again to FIG. 10, at block 1010, a data intake and querysystem causes display of a user interface that displays one or moremetrics based on the calculated number of events. In an embodiment, auser interface generated by a data intake and query system 106 mayinclude a web browser-based interface, an interface of a standalonedesktop or mobile application, or any other type of interface. Forexample, a user may use a web browser or other application to accessvarious components of the system 106, including one or more interfacesdisplaying data ingestion information. The one or more metrics mayinclude, for example, for one or more particular time periods, a totalnumber of events generated, a maximum number of events generated, aminimum number of events generated, an average number of eventsgenerated, event generation trend data, etc. As another example, ametric may display an amount of time elapsed to generate a particularnumber of events. For example, a displayed metric may indicate that twohours elapsed to generate the first 500,000 events, five fours elapsedto generate the next 500,000 events, etc.

FIG. 11 depicts an example “audit view” interface that enables a user toview metrics related to data ingested by a data intake and query system.For example, interface 1100 comprises a dashboard, which can displayvalues 1102 for various associated data ingestion metrics 1104. In FIG.11, for example, values 1102 include a value indicating a minimum numberof events generated per day, an average number of events generated perday, a maximum number of events generated per day, and a recent numberof events generated per day. The maximum, minimum, average, and recentvalues may be calculated relative to a defined time period (e.g., in thepast year) or for the entire duration of time during which events havebeen collected from devices associated with the particular user account.Values 1102 further include a trend value (−4M) indicating a change inthe number of events generated from a previous time period. Theparticular values in FIG. 11 may indicate, for example, that the dataintake and query system generated 8 million events during the currentday, which is 4 million events less than the number generated theprevious day.

FIG. 11 also displays a bar chart visualization 1106, which indicates anumber of events generated per day over a number of preceding timeperiods. For example, for each day between March 11 and March 30, thebar chart visualization 1106 of FIG. 11 includes a bar representing anumber of events generated by the data intake and query system duringthat particular day. The display of historical data such as the barchart depicted in FIG. 11 may, for example, enable a user to compare arate of event generation of two more different time periods. Thiscomparison may enable a user to identify trends or anomalies in eventgeneration and that may assist the user in diagnosing issues with theirdevices and/or adjusting a licensed amount of data ingestion.

FIG. 11 further includes two tables 1108 and 1110 which display othermetrics related to data ingestion. For example, table 1108 includes rowsindicating a number of events generated by the data intake and query fora series of days. Table 1108 also includes one column that indicates atotal number of events generated during the particular day correspondingto each row, and another column that indicates an average number ofevents generated per second during that day. The examples metricsdepicted in FIG. 11 are based on time periods corresponding to a day;however, other displayed metrics may be based on other time periods suchas, for example, events per second, events per hour, events per week,events per month, etc.

Table 1110 displays information that indicates, for each of a pluralityof different indexes, a number of events stored in the particular index.For example, a particular user may configure a data intake and querysystem 106 so that particular types of data, or data originating fromparticular devices, are stored in a particular index, and theinformation displayed in table 1110 provides an indication of a numberof events stored in each of the particular indexes during a particulartime period.

FIG. 12 depicts another example audit view interface that enables a userto configure various alert thresholds related to one or more dataingestion metrics. For example, FIG. 12 depicts an example dashboarddisplaying various data ingestion metrics similar to the dashboarddepicted in FIG. 11. The dashboard of FIG. 12 further includes thresholdinput components 1202. The threshold input components 1202 may bedisplayed, for example, in response to a user selecting the “Edit”button near the top of the page, or by providing other input.

In an embodiment, the threshold input components 1202 generally enable auser to specify one or more threshold values related to one or more ofthe displayed metrics. For example, a user may provide an input value oftwenty million at the input component 1202 below the “maximum events perday” metric value. In this example, the provided threshold value mayindicate a threshold number of events generated per day which, ifexceeded, the user desires to receive an alert, notification, or othertype of message informing the user that the threshold has been exceeded.Such an alert may possibly notify a user that unusual activity isoccurring at the user's devices, that a licensed amount of dataingestion is nearly exceeded or is already exceeded, or any otherinformation. The alert may include modifying a color of one or more ofthe displayed metrics values, displaying other visual information oninterface 1200, sending an alert (e.g., an email, text message, instantmessage, etc.) to one or more particular users, etc.

3.4. Monitoring Data Ingestion

As described above, a data intake and query system may calculate anumber of events generated by the system during one or more time periodsto generate one or more visualizations that enable users to view relatedmetrics and other information. In an embodiment, a data intake and querysystem may also use calculations of a number of events generated by thesystem during various time periods to determine fees to chargeassociated users for ingestion of the data, to monitor users exceeding alicensed amount of data ingestion, to generate alerts, and perform otheractions.

In one embodiment, a data intake and query system 106 may use acalculated number of events generated from data received from devicesassociated with a particular user to determine a fee to a charge theuser for ingestion of the data. For example, a data intake and querysystem may calculate a number of events generated for a particular userduring a particular month and multiply the calculated number by anamount to charge per event generated, resulting in a total fee to chargefor ingestion of data during the month. The duration of time tocalculate events and an amount to charge per event generated may bedetermined based on a particular subscription or other arrangementbetween the user and operator of the data intake and query system 106.

The example above describes calculating a fee for data ingestion basedon a total count of events generated and a flat fee per event, however,many other pricing arrangements may be used. In one embodiment, a feecalculation may be based on a peak number of events generated during adefined period of time. For example, at the end of each week, a dataintake and query system may determine a fee based on a peak number ofevents generated by the system during any one day of the week. Thedetermined peak number of events may be used, for example, to determinean amount to charge per event for that week.

In an embodiment, a fee calculation may be based at least in part on atiered set of charged amounts for ingesting data. For example, a dataintake and query system 106 may be configured to charge a first amountper event for the first one million events each day, a second amount perevent for the next one million events generated, and so forth. Asanother example, a fee calculation may be based on a rate at which theevents are generated by a data intake and query system. For example, auser may be charged a first amount per event if one million events aregenerated from data received over a one hour time period, but charged adifferent amount per event if the same one million events are generatedfrom data received over a twelve hour time period.

As another example, a fee calculation may be based at least in part on anumber of devices associated with a user which send data to the dataintake and query system for ingestion. For example, a user may becharged a first amount per event if data is received from a singledevice, and a different amount per event if data is received from twodevices, and so forth.

In one embodiment, a data intake and query system 106 may use acalculated number of events generated by the system to determine whetherusage of the system by a particular user has exceeded a licensed amountof data ingestion. For example, a user of the data intake and querysystem 106 may pay for a subscription or flat-fee amount that enablesthe user to cause the system to ingest up to a particular amount of rawdata received from the user's devices during one or more defined timeperiods. A licensed amount for a user may include one or more thresholdsthat determine the amount of data a user's devices are permitted to haveingested by the data intake and query system. One example of such athreshold may specify a maximum number of events that can be generatedby the data intake and query system per day, per month, or based on anyother defined period of time. As one particular example, a user may beassociated with a licensed amount that specifies a threshold of tenmillion events generated by the system per day. As another example, alicensed amount may specify a threshold corresponding to a maximumaverage number of events generated per day, month, etc. In this way, ifa user exceeds an number of events generated on one particular day dueto unusual circumstances, the user may not exceed a licensed amount aslong as the average number of events generated per day remains below thethreshold amount.

In one embodiment, in response to a data intake and query system 106determining that a calculated number of events exceeds a user's licensedamount, a data intake and query system may perform one or more actions.As one example, a data intake and query system 106 that determines thata particular user has exceeded a licensed amount may store excess eventsreceived from the user's devices in a non-searchable index. In anembodiment, if the user subsequently purchases additional capacity toincrease the licensed amount available to the user, the system may thenenable the user access to the indexed events by making thenon-searchable index searchable or moving the events to an existingsearchable index. In another embodiment, if the system determines that acalculated number of events exceeds a user's licensed amount, the systemmay automatically increase the licensed amount without direct input fromthe user. For example, the user may configure a setting granting thesystem permission to automatically increase the user's licensed amountif the amount is exceeded. This may include permission to automaticallycharge the user for the additional amount of capacity.

In one embodiment, in response to a data intake and query system 106determining that a number of events generated during a defined timeperiod exceeds an allocated event count, the system may automaticallydelete some or all of the events that exceed the allocated event count.For example, a data intake and query system 106 may store events thatexceed an allocated event count in an overflow storage location that isnon-searchable by users associated with the devices producing the datafrom which the events originated. The overflow storage location may havea limited amount of space for storing excess events. In one embodiment,as the overflow storage location reaches capacity, events may be deletedfrom the system. The events may be deleted from the overflow storagelocation based on a deletion policy such as first in, first out (FIFO),last in, first out (LIFO), random selection, or any other deletionpolicy. As another example, a data intake and query system 106 may storeevents that exceed an allocated event count in a storage location thatis non-accessible to users. In addition to being non-searchable, usersmay be prevented from interacting with data stored in a non-accessiblestorage location in other ways, such as not being able to view the data,move the data to another location, extract the data from the system,etc. As yet another example, excess events may be deleted immediatelyand without intermediate storage in overflow storage. As yet anotherexample, in response to determining that a generated number of eventsexceeds an allocated event count, a data intake and query system 106 mayreceive additional raw data but cease generating new events from thereceived data. The system may store the additional raw data in anoverflow storage area and generate new events from the data if anallocated event count is increased, and the additional data may bedeleted after a period of time if the allocated event count is notincreased. As yet another example, in response to determining that agenerated number of events exceeds an allocated event count, the systemmay reject additional raw data from associated devices entirely.

In one embodiment, a data intake and query system 106 may allow anallocated event count to be exceeded in certain cases. For example, inresponse to determining that a number of events generated during adefined time period has reached an allocated event count, the dataintake and query system 106 may continue to generate and store up to athreshold number of additional events. The threshold number ofadditional events may be based on a particular number of events (e.g.,5000 additional events), a percentage of an allocated event count, etc.The system may allow an allocated event count to be exceeded, forexample, to permit some amount of “leeway” in particular cases and/orfor particular users. For example, the system may be configured to alloweach user account to exceed an allocated event count a single time (orany other number of times) without penalty. If a user account exceeds anassociated allocated event count on a single occasion, for example, thesystem may continue to generate and store up to a threshold amount ofevents in excess of the allocated event count. If the same user accountexceeds the allocated event count on more than the permitted number ofoccasions, the system may perform other actions to excess data, asdescribed above, such as storing the data in a non-searchable index,deleting the data, etc. In one embodiment, users may purchase theability to exceed an allocated event count a particular number of times,or the system may automatically charge an additional fee each time anallocated event count is exceeded beyond a permitted number of times.

In an embodiment, in response to a data intake and query system 106determining that a calculated number of events exceeds a user's licensedamount, the system may generate one or more alerts. One example of analert is a graphic or other element displayed on a user interface thatinforms the user that the licensed amount has been exceeded. As anotherexample, the system may generate and send one or more e-mails, instantmessages, text messages, or other communications to one or morespecified users informing the users that the licensed amount isexceeded. A service provider may, for example, configure a data intakeand query system 106 to send an alert to a sales representative when alicensed amount is exceeded so that the sales representative may contactan appropriate customer to inquire about increasing the licensed amount.

In an embodiment, an apparatus comprises a processor and is configuredto perform any of the foregoing methods.

In an embodiment, a non-transitory computer readable storage medium,storing software instructions, which when executed by one or moreprocessors cause performance of any of the foregoing methods.

Note that, although separate embodiments are discussed herein, anycombination of embodiments and/or partial embodiments discussed hereinmay be combined to form further embodiments.

4.0. IMPLEMENTATION MECHANISMS HARDWARE OVERVIEW

According to an embodiment, the techniques described herein areimplemented by one or more special-purpose computing devices. Thespecial-purpose computing devices may be hard-wired to perform thetechniques, or may include digital electronic devices such as one ormore application-specific integrated circuits (ASICs) or fieldprogrammable gate arrays (FPGAs) that are persistently programmed toperform the techniques, or may include one or more general purposehardware processors programmed to perform the techniques pursuant toprogram instructions in firmware, memory, other storage, or acombination. Such special-purpose computing devices may also combinecustom hard-wired logic, ASICs, or FPGAs with custom programming toaccomplish the techniques. The special-purpose computing devices may bedesktop computer systems, portable computer systems, handheld devices,networking devices or any other device that incorporates hard-wiredand/or program logic to implement the techniques.

For example, FIG. 13 is a block diagram that illustrates a computersystem 1300 upon which an embodiment may be implemented. Computer system1300 includes a bus 1302 or other communication mechanism forcommunicating information, and a hardware processor 1304 coupled withbus 1302 for processing information. Hardware processor 1304 may be, forexample, a general purpose microprocessor.

Computer system 1300 also includes a main memory 1306, such as a randomaccess memory (RAM) or other dynamic storage device, coupled to bus 1302for storing information and instructions to be executed by processor1304. Main memory 1306 also may be used for storing temporary variablesor other intermediate information during execution of instructions to beexecuted by processor 1304. Such instructions, when stored innon-transitory storage media accessible to processor 1304, rendercomputer system 1300 into a special-purpose machine that is customizedto perform the operations specified in the instructions.

Computer system 1300 further includes a read only memory (ROM) 1308 orother static storage device coupled to bus 1302 for storing staticinformation and instructions for processor 1304. A storage device 1310,such as a magnetic disk, optical disk, or solid-state drive is providedand coupled to bus 1302 for storing information and instructions.

Computer system 1300 may be coupled via bus 1302 to a display 1312, suchas a cathode ray tube (CRT), for displaying information to a computeruser. An input device 1314, including alphanumeric and other keys, iscoupled to bus 1302 for communicating information and command selectionsto processor 1304. Another type of user input device is cursor control1316, such as a mouse, a trackball, or cursor direction keys forcommunicating direction information and command selections to processor1304 and for controlling cursor movement on display 1312. This inputdevice typically has two degrees of freedom in two axes, a first axis(e.g., x) and a second axis (e.g., y), that allows the device to specifypositions in a plane.

Computer system 1300 may implement the techniques described herein usingcustomized hard-wired logic, one or more ASICs or FPGAs, firmware and/orprogram logic which in combination with the computer system causes orprograms computer system 1300 to be a special-purpose machine. Accordingto one embodiment, the techniques herein are performed by computersystem 1300 in response to processor 1304 executing one or moresequences of one or more instructions contained in main memory 1306.Such instructions may be read into main memory 1306 from another storagemedium, such as storage device 1310. Execution of the sequences ofinstructions contained in main memory 1306 causes processor 1304 toperform the process steps described herein. In alternative embodiments,hard-wired circuitry may be used in place of or in combination withsoftware instructions.

The term “storage media” as used herein refers to any non-transitorymedia that store data and/or instructions that cause a machine tooperate in a specific fashion. Such storage media may comprisenon-volatile media and/or volatile media. Non-volatile media includes,for example, optical disks, magnetic disks, or solid-state drives, suchas storage device 1310. Volatile media includes dynamic memory, such asmain memory 1306. Common forms of storage media include, for example, afloppy disk, a flexible disk, hard disk, solid-state drive, magnetictape, or any other magnetic data storage medium, a CD-ROM, any otheroptical data storage medium, any physical medium with patterns of holes,a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip orcartridge.

Storage media is distinct from but may be used in conjunction withtransmission media. Transmission media participates in transferringinformation between storage media. For example, transmission mediaincludes coaxial cables, copper wire and fiber optics, including thewires that comprise bus 1302. Transmission media can also take the formof acoustic or light waves, such as those generated during radio-waveand infra-red data communications.

Various forms of media may be involved in carrying one or more sequencesof one or more instructions to processor 1304 for execution. Forexample, the instructions may initially be carried on a magnetic disk orsolid-state drive of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over atelephone line using a modem. A modem local to computer system 1300 canreceive the data on the telephone line and use an infra-red transmitterto convert the data to an infra-red signal. An infra-red detector canreceive the data carried in the infra-red signal and appropriatecircuitry can place the data on bus 1302. Bus 1302 carries the data tomain memory 1306, from which processor 1304 retrieves and executes theinstructions. The instructions received by main memory 1306 mayoptionally be stored on storage device 1310 either before or afterexecution by processor 1304.

Computer system 1300 also includes a communication interface 1318coupled to bus 1302. Communication interface 1318 provides a two-waydata communication coupling to a network link 1320 that is connected toa local network 1322. For example, communication interface 1318 may bean integrated services digital network (ISDN) card, cable modem,satellite modem, or a modem to provide a data communication connectionto a corresponding type of telephone line. As another example,communication interface 1318 may be a local area network (LAN) card toprovide a data communication connection to a compatible LAN. Wirelesslinks may also be implemented. In any such implementation, communicationinterface 1318 sends and receives electrical, electromagnetic or opticalsignals that carry digital data streams representing various types ofinformation.

Network link 1320 typically provides data communication through one ormore networks to other data devices. For example, network link 1320 mayprovide a connection through local network 1322 to a host computer 1324or to data equipment operated by an Internet Service Provider (ISP)1326. ISP 1326 in turn provides data communication services through theworld wide packet data communication network now commonly referred to asthe “Internet” 1328. Local network 1322 and Internet 1328 both useelectrical, electromagnetic or optical signals that carry digital datastreams. The signals through the various networks and the signals onnetwork link 1320 and through communication interface 1318, which carrythe digital data to and from computer system 1300, are example forms oftransmission media.

Computer system 1300 can send messages and receive data, includingprogram code, through the network(s), network link 1320 andcommunication interface 1318. In the Internet example, a server 1330might transmit a requested code for an application program throughInternet 1328, ISP 1326, local network 1322 and communication interface1318.

The received code may be executed by processor 1304 as it is received,and/or stored in storage device 1310, or other non-volatile storage forlater execution.

5.0. EXAMPLE EMBODIMENTS

In an embodiment, a method or non-transitory computer readable mediumcomprises:

receiving raw data from one or more devices; generating a pluralityevents from the raw data by: parsing the raw data into a plurality ofevents, each event of the plurality of events including a portion of theraw data; determining a respective timestamp for each event of theplurality of events; determining a number of events of the plurality ofevents that were generated during a defined time period; causing displayof a user interface that displays one or more metrics based on thedetermined number of events.

In an embodiment, the method or computer readable medium furthercomprises: storing the plurality of events in an index.

In an embodiment, the method or non-transitory computer readable mediumfurther comprises: wherein determining the number of events of theplurality that were generated during the defined time period includesdetermining that the number of events are associated with a particularuser account of a plurality of user accounts.

In an embodiment, the method or non-transitory computer readable mediumfurther comprises: wherein determining the number of events of theplurality that were generated during the defined time period includesdetermining that the number of events are associated with a particularproject of a plurality of projects.

In an embodiment, the method or non-transitory computer readable mediumfurther comprises: wherein the plurality of events includes a first setof events associated with a first project and a second set of eventsassociated with a second project, and wherein the user interfacedisplays both a first set of metrics associated with the first projectand a second set of metrics associated with the second project.

In an embodiment, the method or non-transitory computer readable mediumfurther comprises: wherein parsing the raw data into a plurality ofevents further comprises determining event boundaries for the pluralityof events.

In an embodiment, the method or non-transitory computer readable mediumfurther comprises: wherein the plurality of events are searchable usinga late-binding schema comprising one or more extraction rules forextracting values from the events.

In an embodiment, the method or non-transitory computer readable mediumfurther comprises: wherein the defined time period corresponds to one ormore days.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the defined time period corresponds to one or moreseconds.

In an embodiment, a method or non-transitory computer readable mediumcomprises: calculating an average number of events that were generatedover a plurality of time periods.

In an embodiment, a method or non-transitory computer readable mediumcomprises: calculating a fee amount based on the number of events of theplurality of events that were generated during the defined time period.

In an embodiment, a method or non-transitory computer readable mediumcomprises: comparing the number of events to a licensed amount; inresponse to determining that the number of events exceeds the licensedamount, storing excess events in a non-searchable index.

In an embodiment, a method or non-transitory computer readable mediumcomprises: comparing the number of events to a licensed amount; inresponse to determining that the number of events exceeds the licensedamount, storing excess events in a non-searchable index; enabling accessthe indexed events that are stored in the non-searchable index whenadditional capacity to increase the licensed amount is purchased.

In an embodiment, a method or non-transitory computer readable mediumcomprises: comparing the number of events to a licensed amount; inresponse to determining that the number of events exceeds the licensedamount, automatically increasing the licensed amount.

In an embodiment, a method or non-transitory computer readable mediumcomprises: comparing the number of events to a licensed amount; inresponse to determining that the number of events exceeds the licensedamount, generating an alert.

In an embodiment, a method or non-transitory computer readable mediumcomprises: comparing the number of events to a licensed amount; inresponse to determining that the number of events exceeds the licensedamount, sending an alert to a particular user.

In an embodiment, a method or non-transitory computer readable mediumcomprises: calculating a fee amount based on a peak number of eventsgenerated during a defined time period.

In an embodiment, a method or non-transitory computer readable mediumcomprises: calculating a fee amount based on a number of devices fromwhich raw data is received.

In an embodiment, a method or non-transitory computer readable mediumcomprises: calculating a fee amount based on both of a first fee ratefor a first number of events generated and a second fee rate for asecond number of events generated.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the one or more devices are managed by a managedsecurity service provider (MSSP).

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the metrics include a number of events generatedduring a defined period of time.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the metrics include a number of events generatedduring each of one or more previous periods of time.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the metrics include a comparison of a number ofevents generated during at least two different time periods.

In an embodiment, a method or non-transitory computer readable mediumcomprises: calculating a number of events that are stored in one or moreparticular indexes of the one or more indexes.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the data is associated with a particular project of aplurality of projects, each project of the plurality of projects havingan associated licensed amount of data ingestion.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the data includes first data received from one ormore first devices, and the data further includes second data receivedfrom one or more second devices; determining a first number of eventsassociated with the one or more first devices generated during a definedtime period; determining a second number of events associated with theone or more second devices generated during the defined time period.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein a first number of events is associated with a firstproject, and a second number of events is associated with a secondproject; determining a first number of events associated with the firstproject generated during a defined time period; determining a secondnumber of events associated with the second project generated during thedefined time period.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the one or more devices includes both a first set ofdevices associated with a first company and a second set of devicesassociated with a second company, each of the first set of devices andthe second set of devices managed by a managed security service provider(MSSP); wherein the raw data includes first raw data received from thefirst set of devices and second raw data received from the second set ofdevices; wherein determining the number of events of the plurality ofevents that were generated during a defined time period includesdetermining a first number of events generated based on the first rawdata and a second number of events generated based on the second rawdata; wherein causing display of the user interface includes separatelydisplaying first metrics based on the first number of events associatedwith the first company, and second metrics based on the second number ofevents associated with the second company.

In an embodiment, a method or non-transitory computer readable mediumcomprises: wherein the data is associated with a particular project of aplurality of projects, each project of the plurality of projects havingan associated licensed amount.

6.0. EXTENSIONS AND ALTERNATIVES

In the foregoing specification, embodiments have been described withreference to numerous specific details that may vary from implementationto implementation. The specification and drawings are, accordingly, tobe regarded in an illustrative rather than a restrictive sense. The soleand exclusive indicator of the scope of the embodiments, and what isintended by the applicants to be the scope of the embodiments, is theliteral and equivalent scope of the set of claims that issue from thisapplication, in the specific form in which such claims issue, includingany subsequent correction.

In drawings, various system components are depicted as beingcommunicatively coupled to various other components by arrows. Thesearrows illustrate only certain examples of information flows between thecomponents of the depicted systems. Neither the direction of the arrowsnor the lack of arrow lines between certain components should beinterpreted as indicating the absence of communication between thecertain components. Indeed, each component of the depicted systems mayfeature an open port, API, or other suitable communication interface bywhich the component may become communicatively coupled to othercomponents of the depicted systems as needed to accomplish any of thefunctions of the systems described herein.

What is claimed:
 1. A method implemented using a computing device,comprising: receiving raw data from one or more devices; generating aplurality of time stamped events from the raw data; determining a numberof events of the plurality of time-stamped events that were generatedduring a specified time period and that are associated with a particularaccount or project; comparing the number of events that were generatedduring the specified time period to an event count associated with theparticular account or project; in response to a determination that thenumber of events that were generated during the specified time periodhas reached the event count associated with the particular account orproject, determining, based on a license associated with the particularaccount or project, that the particular account or project is permittedto exceed the event count without using an overflow storage location orincreasing the event count; and generating up to a threshold number ofexcess events from excess raw data received subsequent to the eventcount being reached.
 2. The method of claim 1, wherein the determiningthat the particular account or project is permitted to exceed the eventcount is further based on the particular account or project exceedingthe event count less than a limited number of times.
 3. The method ofclaim 1, further comprising, in response to permitting the particularaccount or project to exceed the event count without using the overflowstorage location or increasing the event count, associating a fee withthe particular account or project for exceeding the event count.
 4. Themethod of claim 1, wherein the determining the number of events of theplurality of time-stamped events that were generated during thespecified time period includes determining that the number of events areassociated with the particular account, determined from a plurality ofuser accounts.
 5. The method of claim 1, wherein the determining thenumber of events of the plurality of time-stamped events that weregenerated during the specified time period includes determining that thenumber of events are associated with the particular project, determinedfrom a plurality of projects.
 6. The method of claim 1, wherein theevent count specifies a maximum number, or a maximum average number, ofevents generated during the specified time period.
 7. The method ofclaim 1, further comprising, in response to a determination that asecond number of events generated during a second specified time periodhas reached a second event count for a second account or project,ceasing to accept raw data from the second account or project.
 8. Themethod of claim 1, further comprising, in response to a determinationthat a second number of events generated during a second specified timeperiod has reached a second event count for a second account or project,continuing to accept excess raw data from the second account or projectbut ceasing to generate new events from the excess raw data raw datafrom the second account or project.
 9. The method of claim 1, furthercomprising, in response to a determination that a second number ofevents generated during a second specified time period has reached asecond event count for a second account or project, automaticallyincreasing the second event count for the second account or project. 10.One or more non-transitory computer-readable storage media, storinginstructions, which when executed by one or more processors cause theone or more processors to perform operations comprising: receiving rawdata from one or more devices; generating a plurality of time stampedevents from the raw data; determining a number of events of theplurality of time-stamped events that were generated during a specifiedtime period and that are associated with a particular account orproject; comparing the number of events that were generated during thespecified time period to an event count associated with the particularaccount or project; in response to a determination that the number ofevents that were generated during the specified time period has reachedthe event count associated with the particular account or project,determining, based on a license associated with the particular accountor project, that the particular account or project is permitted toexceed the event count without using an overflow storage location orincreasing the event count; and generating up to a threshold number ofexcess events from excess raw data received subsequent to the eventcount being reached.
 11. The one or more non-transitorycomputer-readable storage media of claim 10, wherein the determiningthat the particular account or project is permitted to exceed the eventcount is further based on the particular account or project exceedingthe event count less than a limited number of times.
 12. The one or morenon-transitory computer-readable storage media of claim 10, theoperations further comprising, in response to permitting the particularaccount or project to exceed the event count without using the overflowstorage location or increasing the event count, associating a fee withthe particular account or project for exceeding the event count.
 13. Theone or more non-transitory computer-readable storage media of claim 10,wherein the determining the number of events of the plurality oftime-stamped events that were generated during the specified time periodincludes determining that the number of events are associated with theparticular account, determined from a plurality of user accounts. 14.The one or more non-transitory computer-readable storage media of claim10, wherein the determining the number of events of the plurality oftime-stamped events that were generated during the specified time periodincludes determining that the number of events are associated with theparticular project, determined from a plurality of projects.
 15. The oneor more non-transitory computer-readable storage media of claim 10,wherein the event count specifies a maximum number, or a maximum averagenumber, of events generated during the specified time period.
 16. Theone or more non-transitory computer-readable storage media of claim 10,the operations further comprising, in response to a determination that asecond number of events generated during a second specified time periodhas reached a second event count for a second account or project,ceasing to accept raw data from the second account or project.
 17. Theone or more non-transitory computer-readable storage media of claim 10,the operations further comprising, in response to a determination that asecond number of events generated during a second specified time periodhas reached a second event count for a second account or project,continuing to accept excess raw data from the second account or projectbut ceasing to generate new events from the excess raw data raw datafrom the second account or project.
 18. The one or more non-transitorycomputer-readable storage media of claim 10, the operations furthercomprising, in response to a determination that a second number ofevents generated during a second specified time period has reached asecond event count for a second account or project, automaticallyincreasing the second event count for the second account or project. 19.A system, comprising: one or more processors; and one or morenon-transitory computer-readable storage media storing instructions,which when executed by the one or more processors cause the one or moreprocessors to perform operations comprising: receiving raw data from oneor more devices; generating a plurality of time stamped events from theraw data; determining a number of events of the plurality oftime-stamped events that were generated during a specified time periodand that are associated with a particular account or project; comparingthe number of events that were generated during the specified timeperiod to an event count associated with the particular account orproject; in response to a determination that the number of events thatwere generated during the specified time period has reached the eventcount associated with the particular account or project, determining,based on a license associated with the particular account or project,that the particular account or project is permitted to exceed the eventcount without using an overflow storage location or increasing the eventcount; and generating up to a threshold number of excess events fromexcess raw data received subsequent to the event count being reached.20. The system of claim 19, wherein the determining that the particularaccount or project is permitted to exceed the event count is furtherbased on the particular account or project exceeding the event countless than a limited number of times.